The Archaeotools project: faceted classification and natural language processing in an archaeological context.

نویسندگان

  • S Jeffrey
  • J Richards
  • F Ciravegna
  • S Waller
  • S Chapman
  • Z Zhang
چکیده

This paper describes 'Archaeotools', a major e-Science project in archaeology. The aim of the project is to use faceted classification and natural language processing to create an advanced infrastructure for archaeological research. The project aims to integrate over 1 x 10(6) structured database records referring to archaeological sites and monuments in the UK, with information extracted from semi-structured grey literature reports, and unstructured antiquarian journal accounts, in a single faceted browser interface. The project has illuminated the variable level of vocabulary control and standardization that currently exists within national and local monument inventories. Nonetheless, it has demonstrated that the relatively well-defined ontologies and thesauri that exist in archaeology mean that a high level of success can be achieved using information extraction techniques. This has great potential for unlocking and making accessible the information held in grey literature and antiquarian accounts, and has lessons for allied disciplines.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

archaeological context and natural language processing in an The Archaeotools project : faceted classification

References l.html#ref-list-1 http://rsta.royalsocietypublishing.org/content/367/1897/2507.ful This article cites 4 articles Rapid response 1897/2507 http://rsta.royalsocietypublishing.org/letters/submit/roypta;367/ Respond to this article Subject collections (10 articles) theory of computing • (5 articles) human-computer interaction • collections Articles on similar topics can be found in the...

متن کامل

Integrating archaeological literature into resource discovery interfaces

There exists a large and underutilized resource of archaeological literature, both formal, such as scholarly journals and less formal in the form of ‘grey literature’. In the archaeological domain the vast majority of this literature contains some geo-spatial element as well as the expected temporal information and therefore its ease of discovery would be greatly enhanced were it accessible via...

متن کامل

Perspectives on crowdsourcing annotations for natural language processing

Crowdsourcing has emerged as a new method for obtaining annotations for training models for machine learning. While many variants of this process exist, they largely differ in their method of motivating subjects to contribute and the scale of their applications. To date, however, there has yet to be a study that helps the practitioner to decide what form an annotation application should take to...

متن کامل

MultiScien: a Bi-Lingual Natural Language Processing System for Mining and Enrichment of Scientific Collections

In the current online Open Science context, scientific datasets and tools for deep text analysis, visualization and exploitation play a major role. We present a system for deep analysis and annotation of scientific text collections. We also introduce the first version of the SEPLN Anthology, a bi-lingual (Spanish and English) fully annotated text resource in the field of natural language proces...

متن کامل

Improved Document Representation for Classification Tasks for the Intelligence Community

Research within a larger, multi-faceted risk assessment project for the Intelligence Community (IC) combines Natural Language Processing (NLP) and Machine Learning techniques to detect potentially malicious shifts in the semantic content of information either accessed or produced by insiders within an organization. Our hypothesis is that the use of fewer, more discriminative linguistic features...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Philosophical transactions. Series A, Mathematical, physical, and engineering sciences

دوره 367 1897  شماره 

صفحات  -

تاریخ انتشار 2009